Duplicate record elimination in large data files

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Redundancy Elimination Within Large Collections of Files

Ongoing advancements in technology lead to everincreasing storage capacities. In spite of this, optimizing storage usage can still provide rich dividends. Several techniques based on delta-encoding and duplicate block suppression have been shown to reduce storage overheads, with varying requirements for resources such as computation and memory. We propose a new scheme for storage reduction that...

متن کامل

Duplicate Data Elimination in a SAN File System

Duplicate Data Elimination (DDE) is our method for identifying and coalescing identical data blocks in Storage Tank, a SAN file system. On-line file systems pose a unique set of performance and implementation challenges for this feature. Existing techniques, which are used to improve both storage and network utilization, do not satisfy these constraints. Our design employs a combination of cont...

متن کامل

A knowledge-based approach for duplicate elimination in data cleaning

Existing duplicate elimination methods for data cleaning work on the basis of computing the degree of similarity between nearby records in a sorted database. High recall can be achieved by accepting records with low degrees of similarity as duplicates, at the cost of lower precision. High precision can be achieved analogously at the cost of lower recall. This is the recall–precision dilemma. We...

متن کامل

Modeling Uncertainty in Duplicate Elimination

Real-world databases experience various data quality problems of different causes including heterogeneity of consolidated data sources, imprecision of reading devices, and data entry errors. Existence of duplicate records is a prominent data quality problem. The process of duplicate elimination often involves uncertainty in deciding on the true duplicates. Current tools resolve such uncertainty...

متن کامل

Sleep Sets Meet Duplicate Elimination

The sleep sets technique is a path-dependent pruning method for state space search. In the past, the combination of sleep sets with graph search algorithms that perform duplicate elimination has often shown to be error-prone. In this paper, we provide the theoretical basis for the integration of sleep sets with common search algorithms in AI that perform duplicate elimination. Specifically, we ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: ACM Transactions on Database Systems

سال: 1983

ISSN: 0362-5915,1557-4644

DOI: 10.1145/319983.319987